Twenty (or so) Questions: D-ary Bounded-Length Huffman Coding
نویسنده
چکیده
The game of Twenty Questions has long been used to illustrate binary source coding. Recently, a physical device has been developed that mimics the process of playing Twenty Questions, with the device supplying the questions and the user providing the answers. However, this game differs from Twenty Questions in two ways: Answers need not be only “yes” and “no,” and the device continues to ask questions beyond the traditional twenty; typically, at least 20 and at most 25 questions are asked. The nonbinary variation on source coding is one that is well known and understood, but not with such bounds on length. An O(n(lmax− lmin))-time O(n)-space PackageMerge-based algorithm is presented here for D-ary (binary or nonbinary) source coding with all n codeword lengths (numbers of questions) bounded to be within the interval [lmin, lmax]. This algorithm minimizes average codeword length or, more generally, any other quasiarithmetic convex coding penalty. In the case of minimizing average codeword length, both time and space complexity can be improved via an alternative graph-based reduction. This has, as a special case, a method for nonbinary length-limited Huffman coding, which was previously solved via dynamic programming with O(n2lmax logD) time and O(n logD) space. These algorithms can also be used to efficiently find a code that is optimal given a limit on fringe, the difference between the lengths of longest and shortest codewords, a problem previously without a polynomial-time solution.
منابع مشابه
Twenty (or so) Questions: Bounded-Length Huffman Coding
The game of Twenty Questions has long been used to illustrate binary source coding. Recently, a physical device has been developed which mimics the process of playing Twenty Questions, with the device supplying the questions and the user providing the answers. However, this game differs from Twenty Questions in two ways: Answers need not be only “yes” and “no,” and the device continues to ask q...
متن کاملTwenty (or so) Questions: $D$-ary Length-Bounded Prefix Coding
Efficient optimal prefix coding has long been accomplished via the Huffman algorithm. However, there is still room for improvement and exploration regarding variants of the Huffman problem. Length-limited Huffman coding, useful for many practical applications, is one such variant, for which codes are restricted to the set of codes in which none of the n codewords is longer than a given length, ...
متن کاملTight Bounds on the Redundancy of Huffman Codes
Consider a discrete finite source with N symbols, and with the probability distribution p := (u1, u2, . . . , uN). It is well-known that the Huffman encoding algorithm [1] provides an optimal prefix code for this source. A D-ary Huffman code is usually represented using a D-ary tree T , whose leaves correspond to the source symbols; The D edges emanating from each intermediate node of T are lab...
متن کاملTwenty Questions Games Always End With Yes
Huffman coding is often presented as the optimal solution to Twenty Questions. However, a caveat is that Twenty Questions games always end with a reply of “Yes,” whereas Huffman codewords need not obey this constraint. We bring resolution to this issue, and prove that the average number of questions still lies between H(X) and H(X) + 1.
متن کاملAnalysis of parameters of trees corresponding to Huffman codes and sums of unit fractions
For fixed t ≥ 2, we consider the class of representations of 1 as sum of unit fractions whose denominators are powers of t or equivalently the class of canonical compact t-ary Huffman codes or equivalently rooted t-ary plane “canonical” trees. We study the probabilistic behaviour of the height (limit distribution is shown to be normal), the number of distinct summands (normal distribution), the...
متن کامل